Roadheader是一款在地下工程和采矿行业中广泛使用的工程机器人。 Roadheader的交互式动力学模拟是无人发掘和虚拟现实训练中的一个基本问题。但是,当前的研究仅基于传统的动画技术或商业游戏引擎。很少有研究将计算机图形的实时物理模拟应用于Roadheader机器人领域。本文旨在介绍一个基于物理的式型型式机器人的模拟系统。为此,提出了基于广义坐标的改进的多体模拟方法。首先,我们的仿真方法描述了基于广义坐标的机器人动力学。与最新方法相比,我们的方法更稳定和准确。数值仿真结果表明,在相同数量的迭代中,我们的方法的错误明显少于游戏引擎。其次,我们对动态迭代采用符号欧盟积分器,而不是传统的四阶runge-kutta(RK4)方法。与其他集成剂相比,在长期模拟过程中,我们的方法在能量漂移方面更加稳定。测试结果表明,我们的系统达到了每秒60帧(FPS)的实时交互性能。此外,我们提出了一种模型格式,用于实施该系统的路障机器人建模。我们的Roadheader的交互式模拟系统满足了交互,准确性和稳定性的要求。
translated by 谷歌翻译
点云的不规则性和混乱为点云分析带来了许多挑战。 PointMLP表明几何信息不是点云分析中唯一的关键点。它基于使用几何仿射模块的简单多层感知(MLP)结构实现了有希望的结果。但是,这些类似MLP的结构仅具有固定权重的聚合特征,而不同点特征的语义信息的差异被忽略。因此,我们提出了点特征的新的点矢量表示,以通过使用电感偏置来改善特征聚集。引入矢量表示的方向可以根据语义关系动态调节两个点特征的聚合。基于它,我们设计了一个新颖的Point2vector MLP体系结构。实验表明,与先前的最佳方法相比,它在ScanoBjectNN数据集的分类任务上实现了最新的性能,增加了1%。我们希望我们的方法可以帮助人们更好地了解语义信息在点云分析中的作用,并导致探索更多更好的特征表示或其他方式。
translated by 谷歌翻译
深度学习为许多计算机视觉任务提供了一种强大的新方法。来自航空图像的高度预测是那些从替代旧多视图几何技术的深度学习的部署大大受益的任务之一。这封信提出了一种两级方法,其中首先是多任务神经网络用于预测由单个RGB空中输入图像产生的高度图。我们还包括第二种细化步骤,其中用于产生更高质量的高度图。两个公开数据集的实验表明我们的方法能够产生最先进的结果。代码可在https://github.com/melhousni/dsmnet上获得。
translated by 谷歌翻译
随着自动驾驶行业正在缓慢成熟,视觉地图本地化正在迅速成为尽可能准确定位汽车的标准方法。由于相机或激光镜等视觉传感器返回的丰富数据,研究人员能够构建具有各种细节的不同类型的地图,并使用它们来实现高水平的车辆定位准确性和在城市环境中的稳定性。与流行的SLAM方法相反,视觉地图本地化依赖于预先构建的地图,并且仅通过避免误差积累或漂移来提高定位准确性。我们将视觉地图定位定义为两个阶段的过程。在位置识别的阶段,通过将视觉传感器输出与一组地理标记的地图区域进行比较,可以确定车辆在地图中的初始位置。随后,在MAP指标定位的阶段,通过连续将视觉传感器的输出与正在遍历的MAP的当前区域进行对齐,对车辆在地图上移动时进行了跟踪。在本文中,我们调查,讨论和比较两个阶段的基于激光雷达,基于摄像头和跨模式的视觉图本地化的最新方法,以突出每种方法的优势。
translated by 谷歌翻译
点云语义分段由于其对光线的稳健性而引起了注意。这使其成为自动驾驶的理想语义解决方案。但是,考虑到神经网络的巨大计算负担和带宽的要求,将所有计算都放入车辆电子控制单元(ECU)不高度或实用。在本文中,我们根据范围视图提出了一个轻巧的点云语义分割网络。由于其简单的预处理和标准卷积,在像DPU这样的深度学习加速器上运行时,它是有效的。此外,为自动驾驶汽车构建了近传感器计算系统。在该系统中,放置在LIDAR传感器旁边的基于FPGA的深度学习加速器核心(DPU),以执行点云预处理和分割神经网络。通过仅将后处理步骤留给ECU,该解决方案大大减轻了ECU的计算负担,因此缩短了决策和车辆反应潜伏期。我们的语义分割网络在Xilinx DPU上获得了10帧(FPS),其计算效率为42.5 GOP/w。
translated by 谷歌翻译
越来越多的东西数量(物联网)设备使得必须了解他们在网络安全方面所面临的真实威胁。虽然蜜罐已经历史上用作诱饵设备,以帮助研究人员/组织更好地了解网络的威胁动态及其影响,因此由于各种设备及其物理连接,IOT设备为此目的构成了独特的挑战。在这项工作中,通过在低互动蜜罐生态系统中观察真实世界攻击者的行为,我们(1)我们(1)介绍了创建多阶段多方面蜜罐生态系统的新方法,逐渐增加了蜜罐的互动的复杂性有了对手,(2)为相机设计和开发了一个低交互蜜罐,允许研究人员对攻击者的目标进行更深入的了解,并且(3)设计了一种创新的数据分析方法来识别对手的目标。我们的蜜罐已经活跃三年了。我们能够在每个阶段收集越来越复杂的攻击数据。此外,我们的数据分析指向蜜罐中捕获的绝大多数攻击活动共享显着的相似性,并且可以集聚集和分组,以更好地了解野外物联网攻击的目标,模式和趋势。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译